Exploring meta-data of human vaginal microbiome

Group 6

Alberte Englund
Mathilde Due
Line Winther Gormsen
Sigrid Frandsen
Kristine Johansen

STUDY DESCRIPTION

Meta-data from MGnify’s vaginal microbiome genome catalogue

  • Uncover patterns in genome quality, taxonomic composition, and ecological characteristics.

  • Uncover potential patterns for diagnosis of endometriosis via associated pathogens:

    • Anaerococcus, Ureaplasma, Gardnerella, Veillonella, Corynebacterium, Peptoniphilus, Candida albicans, Alloscardovia

DATA CLEANING AND WRANGLING

Untidy –> tidy data

  1. Each variable is saved in its own column.
  2. Each observation is saved in its own row.
  3. Each “type” of observation is stored in a single table.
# A tibble: 618 × 20
   Genome        Genome_type  Length N_contigs    N50 GC_content Completeness
   <chr>         <chr>         <dbl>     <dbl>  <dbl>      <dbl>        <dbl>
 1 MGYG000303700 MAG          678213         2 466332       47.8         63.7
 2 MGYG000303701 MAG         1500176        18 112881       42.4         87.8
 3 MGYG000303702 MAG         1210062        44  48790       26.4         94.8
 4 MGYG000303703 MAG         1706016        27  89653       44.6         93.7
 5 MGYG000303704 MAG          703182         7 111709       47.8         63.7
 6 MGYG000303705 MAG         2542045       112  34925       48           97.9
 7 MGYG000303706 MAG         1449687       185  10153       34.8         85.2
 8 MGYG000303707 MAG         1874692        90  28768       37.1         99.0
 9 MGYG000303708 MAG         1480380        12 169949       42.2         87.6
10 MGYG000303709 MAG          694644        57  15063       47.9         62.0
# ℹ 608 more rows
# ℹ 13 more variables: Contamination <dbl>, rRNA_5S <dbl>, rRNA_16S <dbl>,
#   rRNA_23S <dbl>, tRNAs <dbl>, Genome_accession <chr>, Species_rep <chr>,
#   Lineage <chr>, Sample_accession <chr>, Study_accession <chr>,
#   Country <chr>, Continent <chr>, FTP_download <chr>
# A tibble: 618 × 25
   Genome        Genome_type  Length N_contigs    N50 GC_content Completeness
   <chr>         <chr>         <dbl>     <dbl>  <dbl>      <dbl>        <dbl>
 1 MGYG000303700 MAG          678213         2 466332       47.8         63.7
 2 MGYG000303701 MAG         1500176        18 112881       42.4         87.8
 3 MGYG000303702 MAG         1210062        44  48790       26.4         94.8
 4 MGYG000303703 MAG         1706016        27  89653       44.6         93.7
 5 MGYG000303704 MAG          703182         7 111709       47.8         63.7
 6 MGYG000303705 MAG         2542045       112  34925       48           97.9
 7 MGYG000303706 MAG         1449687       185  10153       34.8         85.2
 8 MGYG000303707 MAG         1874692        90  28768       37.1         99.0
 9 MGYG000303708 MAG         1480380        12 169949       42.2         87.6
10 MGYG000303709 MAG          694644        57  15063       47.9         62.0
# ℹ 608 more rows
# ℹ 18 more variables: Contamination <dbl>, rRNA_5S <dbl>, rRNA_16S <dbl>,
#   rRNA_23S <dbl>, tRNAs <dbl>, Country <chr>, Continent <chr>, Domain <chr>,
#   Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>,
#   Species <chr>, Completeness_quality <chr>, Contamination_quality <chr>,
#   Overall_quality <chr>, endometriosis_associated <lgl>

DATA DESCRIPTION

  • 618 vaginal metagenome-assembled genomes (MAGs)
  • 25 variables covering taxonomy, assembly quality, and geography
  • High completeness and low contamination for most genomes
  • Dataset dominated by a few major bacterial phyla
  • Genome lengths fall within biologically expected ranges

Most MAGs belong to only a few dominant phyla.
This indicates strong taxonomic skew in the dataset.


Most genomes have high completeness (>90%),
indicating generally strong assembly quality.


Genome lengths fall within the expected biological range
for vaginal bacterial taxa (typically 1.5–3 Mb).

ANALYSIS 1

ANALYSIS 2

ANALYSIS 3 - Associated and non-associated-endometriosis MAGs

  • Compared endometriosis-associated vs non-associated MAGs
  • Focused on GC content, genome length, completeness & contamination
  • Investigated whether associated MAGs cluster taxonomically
  • Goal: determine if associated MAGs form a genomically distinct group

Endometriosis-associated MAGs occur in only a few phyla.
Most phyla contain no associated MAGs, suggesting limited taxonomic clustering.


GC content ranges overlap almost completely.
No evidence that GC% distinguishes associated vs non-associated MAGs.

ANALYSIS 4 - Species Distribution between countries

  • Investigating the distribution of lineage groups in Countries
  • Counted group instances for each country and wide format
  • Filtered for NA in Countries
  • Big difference in sample size –> normalize

Some variation between countries. I.e. Order Bacteroidales, but not much.

Could’ve tested for significance.

Only two principal components –> 100% variance.

Clear division of countries.

Along PC1, Fusobacteria and Bacteroidota (order = Bacteroidales). Correlation with heatmap.

DISCUSSION

  • High-quality MAGs with good completeness
  • No strong genomic differences between groups
  • Limited metadata and uneven sampling

FUTURE PERSPECTIVES

  • Improve clinical + geographic metadata

CONCLUSION